Skip navigation
Use este identificador para citar ou linkar para este item: https://repositorio.ufpe.br/handle/123456789/51840

Compartilhe esta página

Título: Discovering a domain-specific schema from general-purpose knowledge base
Autor(es): SILVA NETO, Everaldo Costa
Palavras-chave: Banco de dados; Descoberta de esquema; Descoberta do domínio; Representação de entidade
Data do documento: 13-Jun-2023
Editor: Universidade Federal de Pernambuco
Citação: SILVA NETO, Everaldo Costa. Discovering a domain-specific schema from general-purpose knowledge base. 2023. Tese (Doutorado em Ciência da Computação) – Universidade Federal de Pernambuco, Recife, 2023.
Abstract: General-purpose knowledge bases (KBs), e.g., DBpedia, YAGO, and Wikidata, store fac- tual data about a set of entities. These KBs have been constructed to store cross-domain knowledge, e.g., health, entertainment, industry, sports, and arts. Most applications that use data from general-purpose KBs are domain-specific. Some tasks, such as query formu- lation and information extraction, require a data schema to explore the contents of a KB. However, schema-related declarations are not mandatory and, sometimes, are not pro- vided. Therefore, these domain-specific applications face two issues: (1) they require only a subset of data that meets the domain of interest, but general-purpose KBs have a large volume of factual data within many distinct domains; and (2) the lack of schema-related information. In this thesis, we address the problem of domain-specific schema discov- ery from general-purpose KBs. Specifically, we build ANCHOR, an end-to-end pipeline to identify a domain-specific dataset as well as its schematic description in an automatic way. ANCHOR works in three steps: domain discovery, class identification and class schema discovery. First, it extracts a specific domain exploring category-category mappings from KB. From this, it identifies domain entities through entity-category mappings. Next, the class identification step discovers implicit classes within the dataset. For that, ANCHOR learns entity representation from entity-category mappings and uses it to identify im- plicit entities’ classes by grouping similar entities. Finally, the class schema discovery task builds the class schema, i.e., it identifies a set of relevant attributes that best describe the entities within the same class. For that, ANCHOR runs CoFFee, an approach based on attributes co-occurrence and frequency to identify a set of core attributes for each class discovered in the previous step. We have performed an extensive experimental evaluation on four distinct DBpedia domains. For the class identification task, we compare ANCHOR against some traditional and embedding-based baselines. The results show that applied to standard clustering algorithms, our entity representation outperforms the baselines and is effective for the class identification task. For the class schema discovery task, we compare CoFFee against two state-of-the-art approaches. The results show that CoFFee proved to be effective in filtering out less relevant attributes. It selects a set of core attributes keep- ing its retrieval rate high and producing a higher-quality schema class for the identified classes.
URI: https://repositorio.ufpe.br/handle/123456789/51840
Aparece nas coleções:Teses de Doutorado - Ciência da Computação

Arquivos associados a este item:
Arquivo Descrição TamanhoFormato 
TESE Everaldo Costa Silva Neto.pdf4,11 MBAdobe PDFThumbnail
Visualizar/Abrir


Este arquivo é protegido por direitos autorais



Este item está licenciada sob uma Licença Creative Commons Creative Commons